BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution

نویسندگان

  • Chengxi Ye
  • Chiaowen Hsiao
  • Héctor Corrada Bravo
چکیده

MOTIVATION Base-calling of sequencing data produced by high-throughput sequencing platforms is a fundamental process in current bioinformatics analysis. However, existing third-party probabilistic or machine-learning methods that significantly improve the accuracy of base-calls on these platforms are impractical for production use due to their computational inefficiency. RESULTS We directly formulate base-calling as a blind deconvolution problem and implemented BlindCall as an efficient solver to this inverse problem. BlindCall produced base-calls at accuracy comparable to state-of-the-art probabilistic methods while processing data at rates 10 times faster in most cases. The computational complexity of BlindCall scales linearly with read length making it better suited for new long-read sequencing technologies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An adaptive decorrelation method removes Illumina DNA base-calling errors caused by crosstalk between adjacent clusters

Base-calling accuracy is crucial for high-throughput DNA sequencing and downstream analysis such as read mapping and genome assembly. Accordingly, we made an endeavor to reduce DNA sequencing errors of Illumina systems by correcting three kinds of crosstalk in the cluster intensity data. We discovered that signal crosstalk between adjacent clusters accounts for a large portion of sequencing err...

متن کامل

naiveBayesCall: An Efficient Model-Based Base-Calling Algorithm for High-Throughput Sequencing

Immense amounts of raw instrument data (i.e., images of fluorescence) are currently being generated using ultra high-throughput sequencing platforms. An important computational challenge associated with this rapid advancement is to develop efficient algorithms that can extract accurate sequence information from raw data. To address this challenge, we recently introduced a novel model-based base...

متن کامل

DNA sequencing and parametric deconvolution

One of the key practices of the Human genome project is Sanger DNA sequencing. Its data analysis part is called base-calling, which attempts to reconstruct target DNA sequences from fluorescence intensities generated by sequencing machines. In this paper, we present our modeling framework of DNA sequencing, in which a base-calling scheme arises naturally. A large portion of DNA sequencing error...

متن کامل

SNP calling using genotype model selection on high-throughput sequencing data

MOTIVATION A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping qualities as sources of error when calling SNPs. Thus, errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for. RESULTS A n...

متن کامل

meRanTK: methylated RNA analysis ToolKit

UNLABELLED The significance and function of posttranscriptional cytosine methylation in poly(A)RNA attracts great interest but is still poorly understood. High-throughput sequencing of RNA treated with bisulfite (RNA-BSseq) or subjected to enrichment techniques like Aza-IP or miCLIP enables transcriptome wide studies of this particular modification at single base pair resolution. However, to da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2014